Goto

Collaborating Authors

 wake-sleep algorithm


Generative Flow Networks: Theory and Applications to Structure Learning

Deleu, Tristan

arXiv.org Artificial Intelligence

Without any assumptions about data generation, multiple causal models may explain our observations equally well. To avoid selecting a single arbitrary model that could result in unsafe decisions if it does not match reality, it is therefore essential to maintain a notion of epistemic uncertainty about our possible candidates. This thesis studies the problem of structure learning from a Bayesian perspective, approximating the posterior distribution over the structure of a causal model, represented as a directed acyclic graph (DAG), given data. It introduces Generative Flow Networks (GFlowNets), a novel class of probabilistic models designed for modeling distributions over discrete and compositional objects such as graphs. They treat generation as a sequential decision making problem, constructing samples of a target distribution defined up to a normalization constant piece by piece. In the first part of this thesis, we present the mathematical foundations of GFlowNets, their connections to existing domains of machine learning and statistics such as variational inference and reinforcement learning, and their extensions beyond discrete problems. In the second part of this thesis, we show how GFlowNets can approximate the posterior distribution over DAG structures of causal Bayesian Networks, along with the parameters of its causal mechanisms, given observational and experimental data.


Reviews: Flexible and accurate inference and learning for deep generative models

Neural Information Processing Systems

This paper presents an alternative to variational autoencoders and other generative models with latent variables that rely on the wake-sleep algorithm for training. The main problem with the wake-sleep algorithm is its bias: the recognition model has different conditional independencies than the generative model and it's trained to optimize a different objective. DDC-HM solves this by instead working with sufficient statistics, and using those to implicitly define the maximum entropy distribution consistent with those statistics. The measurements chosen are random functions. The methods are evaluated on synthetic data and two small vision datasets (image patches and MNIST), comparing against two baselines using the MMD metric. I don't know the related work well enough to evaluate the novelty with confidence.


Does the Wake-sleep Algorithm Produce Good Density Estimators?

Neural Information Processing Systems

The wake-sleep algorithm (Hinton, Dayan, Frey and Neal 1995) is a rel(cid:173) atively efficient method of fitting a multilayer stochastic generative model to high-dimensional data. In addition to the top-down connec(cid:173) tions in the generative model, it makes use of bottom-up connections for approximating the probability distribution over the hidden units given the data, and it trains these bottom-up connections using a simple delta rule. We use a variety of synthetic and real data sets to compare the per(cid:173) formance of the wake-sleep algorithm with Monte Carlo and mean field methods for fitting the same generative model and also compare it with other models that are less powerful but easier to fit.


Convergence of the Wake-Sleep Algorithm

Neural Information Processing Systems

The W-S (Wake-Sleep) algorithm is a simple learning rule for the models with hidden variables. It is shown that this algorithm can be applied to a factor analysis model which is a linear version of the Helmholtz ma(cid:173) chine. But even for a factor analysis model, the general convergence is not proved theoretically. In this article, we describe the geometrical un(cid:173) derstanding of the W-S algorithm in contrast with the EM (Expectation(cid:173) Maximization) algorithm and the em algorithm. As the result, we prove the convergence of the W-S algorithm for the factor analysis model. We also show the condition for the convergence in general models.


Natural Wake-Sleep Algorithm

Várady, Csongor, Volpi, Riccardo, Malagò, Luigi, Ay, Nihat

arXiv.org Machine Learning

The benefits of using the natural gradient are well known in a wide range of optimization problems. However, for the training of common neural networks the resulting increase in computational complexity sets a limitation to its practical application. Helmholtz Machines are a particular type of generative model composed of two Sigmoid Belief Networks (SBNs), acting as an encoder and a decoder, commonly trained using the Wake-Sleep (WS) algorithm and its reweighted version RWS. For SBNs, it has been shown how the locality of the connections in the graphical structure induces sparsity in the Fisher information matrix. The resulting block diagonal structure can be efficiently exploited to reduce the computational complexity of the Fisher matrix inversion and thus compute the natural gradient exactly, without the need of approximations. We present a geometric adaptation of well-known methods from the literature, introducing the Natural Wake-Sleep (NWS) and the Natural Reweighted Wake-Sleep (NRWS) algorithms. We present an experimental analysis of the novel geometrical algorithms based on the convergence speed and the value of the log-likelihood, both with respect to the number of iterations and the time complexity and demonstrating improvements on these aspects over their respective non-geometric baselines.


An adversarial algorithm for variational inference with a new role for acetylcholine

Benjamin, Ari S., Kording, Konrad P.

arXiv.org Machine Learning

Sensory learning in the mammalian cortex has long been hypothesized to involve the objective of variational inference (VI). Likely the most well-known algorithm for cortical VI is the Wake-Sleep algorithm (Hinton et al. 1995). However Wake-Sleep problematically assumes that neural activities are independent given lower-layers during generation. Here, we construct a VI system that is both compatible with neurobiology and avoids this assumption. The core of the system is a wake-sleep discriminator that classifies network states as inferred or self-generated. Inference connections learn by opposing this discriminator. This adversarial dynamic solves a core problem within VI, which is to match the distribution of stimulus-evoked (inference) activity to that of self-generated activity. Meanwhile, generative connections learn to predict lower-level activity as in standard VI. We implement this algorithm and show that it can successfully train the approximate inference network for generative models. Our proposed algorithm makes several biological predictions that can be tested. Most importantly, it predicts a teaching signal that is remarkably similar to known properties of the cholinergic system.



Neural characterization in partially observed populations of spiking neurons

Pillow, Jonathan W., Latham, Peter E.

Neural Information Processing Systems

Point process encoding models provide powerful statistical methods for understanding the responses of neurons to sensory stimuli. Although these models have been successfully applied to neurons in the early sensory pathway, they have fared less well capturing the response properties of neurons in deeper brain areas, owing in part to the fact that they do not take into account multiple stages of processing. Here we introduce a new twist on the point-process modeling approach: we include unobserved as well as observed spiking neurons in a joint encoding model. The resulting model exhibits richer dynamics and more highly nonlinear response properties, making it more powerful and more flexible for fitting neural data. More importantly, it allows us to estimate connectivity patterns among neurons (both observed and unobserved), and may provide insight into how networks process sensory input. We formulate the estimation procedure using variational EM and the wake-sleep algorithm, and illustrate the model's performance using a simulated example network consisting of two coupled neurons.


Convergence of the Wake-Sleep Algorithm

Ikeda, Shiro, Amari, Shun-ichi, Nakahara, Hiroyuki

Neural Information Processing Systems

The WS (Wake-Sleep) algorithm is a simple learning rule for the models with hidden variables. It is shown that this algorithm can be applied to a factor analysis model which is a linear version of the Helmholtz machine. Buteven for a factor analysis model, the general convergence is not proved theoretically. In this article, we describe the geometrical understanding ofthe WS algorithm in contrast with the EM (Expectation Maximization) algorithm and the em algorithm. As the result, we prove the convergence of the WS algorithm for the factor analysis model. We also show the condition for the convergence in general models.


Convergence of the Wake-Sleep Algorithm

Ikeda, Shiro, Amari, Shun-ichi, Nakahara, Hiroyuki

Neural Information Processing Systems

The W-S (Wake-Sleep) algorithm is a simple learning rule for the models with hidden variables. It is shown that this algorithm can be applied to a factor analysis model which is a linear version of the Helmholtz machine. But even for a factor analysis model, the general convergence is not proved theoretically. In this article, we describe the geometrical understanding of the W-S algorithm in contrast with the EM (Expectation Maximization) algorithm and the em algorithm. As the result, we prove the convergence of the W-S algorithm for the factor analysis model. We also show the condition for the convergence in general models.